Android 10 and 11 Benchmarks and ARM big.LITTLE Architecture Issues


Contents


Introduction Configurations Whetstone Benchmark
Dhrystone Benchmark Linpack Benchmark Livermore Loops Benchmark
MemSpeed Benchmark NeonSpeed Benchmark BusSpeed Benchmark
RandMem Benchmark FFT Benchmarks MP-Whetstone Benchmark
MP-Dhrystone Benchmark MP-BusSpeed Benchmark MP-RandMem Benchmark
MP-MFLOPS Benchmark NEON-MFLOPS-MP Benchmark Java OpenGL Benchmark
Java Drawing Benchmark Java Whetstone Benchmark Java Linpack Benchmark
DriveSpeed Benchmark CPU Stress Tests Integer Stress Benchmark
Floating Point Stress Benchmark Integer Stress Tests Floating Point Stress Tests
More Integer Stress Tests More Floating Point Stress Tests


Summary

Recently, my phone’s Operating System was upgraded to Android 10, first use being to verify that my existing benchmarks ran successfully, also to confirm that there were no changes in performance or calculated results.

I then bought a new phone, with advanced hardware technology, that uses Android 11, to also verify that the benchmarks would run successfully.

Both phones use ARM big.LITTLE Architecture, where big implies more advanced CPU facilities. The latest device has a 2+6 CPU cores arrangement, at maximum claimed clock speeds of 2.0 and 1.8 GHz respectively, with the older one at 4+4, all at 2.0 GHz.

10 Single Core Benchmarks - These provide more than 300 performance comparisons, with only one slightly faster on the old phone and average performance of the new one mainly 1.4 to 2.4 times faster.

6 Multi-Core Benchmarks - These produce 52 results and comparisons from using each of 1, 2, 4 and 8 threads. In this case, performance was highly dependent on big.LITTLE architecture, technology vintage, CPU MHz and core count differences, besides program code complexity. Then, the 2+6 device only demonstrated its across the board superiority using one and two threads. Comparative 2+6 over 4+4 performance ratios over all results were:

Threads    min    max     av

    1     0.94   2.73   1.61
    2     0.80   4.36   1.96
    4     0.44   5.41   1.64
    8     0.67   4.59   1.43
 

SIMD MFLOPS Benchmarks - The two fastest cores of 2+6 configuration performed particularly well running floating point calculations at around 12 GFLOPS per core, or approaching hoped for maximum of 8 operations per clock cycle

5 Other Benchmark - The 27 test results cover Java processing and graphics, with the later device performing better on all, up to over twice the speed. The main drive benchmark reading speed indicated up to 279 MB/second on the old phone and 450 MB/second on the new one, but with one out of three at a third of that.

Stress Tests - Integer and Floating Point Stress Tests have run time parameters to use up to 32 threads, 3 data sizes, covering caches and RAM, and running time. The latter test also has 3 choices of floating point operations per data word read.

Stress Test Benchmarks - Benchmarking modes provide tests, over the whole parameter range, to help in deciding stress testing procedures, with 18 results for integers and 36 for floating point (up to 8 threads). Performance differences between the two phones were similar to the other benchmarks, with the latest faster using one or two threads but sometimes slower with higher thread counts.

Stress Tests - The first stress tests were run for 15 minutes with 8 threads and using shared data to fit in L1 caches. Results shown are average performance and sample 8 core MHz measurements at 30 second intervals. The phone, using the newer technology, effectively ran at maximum speed during both integer and floating point tests, but the older one suffered from CPU MHz throttling (demonstrated). In this case, the new phone was somewhat faster (int 12% FP 21%) at the start, increasing to 41% and 46% long before the 15 minute end time.

The above stress tests were repeated using 2 and 4 threads and with 32 threads using RAM based data. With integer functions, phone 2/1 performance comparisons were constant at 2 threads and 32 threads using RAM, but the old phone moved from faster to slower using 4 threads. The floating point tests indicated that phone 1 suffered from varying amounts of MHz throttling in all cases, with phone 2 always performing better.

Introductions Next or Go To Start


Inroduction

In 2018, I published this ResearhGate PDF report, with background and details of the small change required for my benchmarks to run under Android 8, with appropriate references and links to earlier programs and results. Later, I repeated the tests under Android 9, reported, including details from others, at ResearhGate in Android 9 Benchmarks and Stress Tests On 32 Bit and 64 Bit CPUs.pdf.

The documents from both of the above links provide the options to independently download and install all the programs used, and also include detailed descriptions, not provided here.

Recently, my phone was upgraded to Android 10, where I verified that the programs operated correctly. Then I bought another phone, using Android 11 with more advanced facilities, Both use ARM big.LITTLE architecture, the original with a 4 + 4 arrangement, with the new one unbalanced at 2 + 6. This lead to some unusual performance comparisons running multithreaded benchmarks.

With the original benchmarks, the only way I could find to report computer readable results, in the standard format, was via Email. Then selecting the Email or Save button I was the default addressee. This became clearly unacceptable with increasing security concerns. Later Android updates provided alternative options to divert the logged output. Now I select the Google Drive option, allowing me to access the files on my PCs.

The programs provide the following range of activities, the actual testing functions being mainly produced using the same C code as my Windows and Linux benchmarks.

CPU Benchmarks - The first set are the Classic Benchmarks that were the original 1970s to 1980s programs that set standards of performance for computers, comprising Whetstone, Dhrystone, Linpack and Livermore Loops.

Memory Benchmarks - Next are programs that measure performance with data from caches and RAM. MemSpeed (including NeonSpeed variant), BusSpeed and RandMem all use the same range of data sizes between 4 KB and 64 MB. Then there is a Fast Fourier Transform benchmark with multiple data sizes.

MultiThreading Benchmarks - These all measure performance using 1, 2, 4 and 8 threads. The first are MP-Whetstone, MP-Dhrystone and MP-Linpack. The next batch use memory sized 12.8 KB, 128 KB and mainly 12.8 MB, comprising MP-MFLOPS (including NEON-MFLOPS MP), MP-BusSpeed and MP-RandMem.

Java Benchmarks - These comprise Java versions of the Whetstone and Linpack benchmarks, a graphics one using drawing functions and another using OpenGL..

DriveSpeed Benchmarks - For measuring main drive speeds.

CPU Stress Testing Programs - These have variable parameters to run MP benchmarks for extended periods, for identifying overheating and discharging battery performance issues.

Result Comparisons - The main comparisons are for relative performance of the two systems. Most programs either check that the results of calculations are correct or the same as initial values. When these are included in the logged results, they have also been compared, with 1.000000000 indicating identical values.

Configurations next or Go To Start


Configurations

Many ARM processors have options for different sizes of L1, L2 and L3 caches and whether shared by multiple processor cores. Also, it is difficult to discover cache sizes in a particular device. Those shown below are estimates based on performance variations observed in following memory benchmarks.

 System 1 - Motorola One Macro Phone
 CPUs - 4 x 2.0 GHz Cortex-A73 and 4 x 2.0 GHz Cortex-A53, 12 nm
 A73 and A53 caches L1 64 KB, L2 1 MB shared, No L3
 GPU Mali-G72 MP3

 Program Reported System Information
 Device Motorola one macro
 Screen pixels w x h 720 x 1339 
 Android Build Version      10

processor	: 6 7
BogoMIPS	: 26.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd09
CPU revision	: 2

processor	: 3
BogoMIPS	: 26.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

 System 2 - Motorola G50 Phone   
 SOC Snapdragon 750G
 CPUs - 2 x 2.0 GHz Kryo 480 and 6 x 1.8 GHz Kryo 460, 8 nm
 Said to be based on Cortex-A76 and Cortex-A55
 Both Kryo caches L1 64 KB, L2 512 KB, L3 2 MB shared
 GPU Adreno 619 450 MHz

 Program Reported System Information
 Device Motorola moto g(50)
 Screen pixels w x h 720 x 1339 
 Android Build Version      11

processor	: 5
BogoMIPS	: 38.40
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x51
CPU architecture: 8
CPU variant	: 0xd
CPU part	: 0x805
CPU revision	: 14

processor	: 7
BogoMIPS	: 38.40
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer	: 0x51
CPU architecture: 8
CPU variant	: 0x8
CPU part	: 0x804
CPU revision	: 14


  
Whetstone Benchmark below or Go To Start


Whetstone Benchmark - NativeWhetstone2.apk

The Whetstone benchmark carries out both single precision floating point and integer calculations, the overall MWIPS rating being mainly dependent on the former. System 2 provided performance gains all round between 1.26 and 2.35 times.


 System 1 Android 10

 ARM/Intel Native Whetstone Benchmark 4A8 22-Jul-2021 21.48
           Compiled for 64 bit ARM v8a

 Test        MFLOPS    MOPS   millisecs    Results

 N1 float    455.57              0.042   -1.124750137
 N2 float    548.24              0.245   -1.131330490
 N3 if               2378.96     0.044    1.000000000
 N4 fixpt            2478.69     0.127   12.000000000
 N5 cos                91.66     0.908    0.499109805
 N6 float    495.56              1.088    0.999999821
 N7 equal            1049.83     0.176    3.000000000
 N8 exp                47.34     0.786    0.935364604

 MWIPS      2927.45              3.416
 
Total Elapsed Time   16.5 seconds

 System 2 Android 11

ARM/Intel Native Whetstone Benchmark 4A8 24-Jul-2021 18.14
           Compiled for 64 bit ARM v8a

 Test        MFLOPS    MOPS   millisecs    Results

 N1 float   1068.97              0.018   -1.124750137
 N2 float    884.76              0.152   -1.131330490
 N3 if               3008.47     0.034    1.000000000
 N4 fixpt            5014.23     0.063   12.000000000
 N5 cos               142.47     0.584    0.499109805
 N6 float    800.99              0.673    0.999999821
 N7 equal            2005.59     0.092    3.000000000
 N8 exp                69.70     0.534    0.935364604

 MWIPS      4650.43              2.150

 Total Elapsed Time   16.2 seconds

 System 2/1 Comparison

 N1 float     2.35                        1.000000000
 N2 float     1.61                        1.000000000
 N3 if                  1.26              1.000000000
 N4 fixpt               2.02              1.000000000
 N5 cos                 1.55              1.000000000
 N6 float     1.62                        1.000000000
 N7 equal               1.91              1.000000000
 N8 exp                 1.47              1.000000000

 MWIPS        1.59
  
Dhrystone Benchmark below or Go To Start


Dhrystone Benchmark - Dhrystone2i.apk

The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). System 2 was indicated as being 1.69 times faster than System 1. Results are often quoted as DMIPS per MHz, in this case 4.16 and 7.03 respectively.

 System 1 Android 10

 ARM/Intel Dhrystone 2 Benchmark 4A8 22-Jul-2021 21.45

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          68 
 Dhrystones per Second            14614554 
 VAX MIPS rating                      8318 

 System 2 Android 11

ARM/Intel Dhrystone 2 Benchmark 4A8 24-Jul-2021 20.59

           Compiled for 64 bit ARM v8a

 Nanoseconds one Dhrystone run          41 
 Dhrystones per Second            24688271 
 VAX MIPS rating                     14051 

 System 2/1 Comparison

 VAX MIPS rating                      1.69
   

Linpack Benchmark below or Go To Start


Linpack Tests - LinpackDP2.apk, LinpackSP2.apk, NEON-Linpacki.apk

The Linpack benchmark speed is measured in MFLOPS, the original for double precision (DP) floating point calculations. The single precision (SP) version, was produced as the early ARM processors did not include SIMD DP instructions. NEON SP SIMD operations were included later. Results for this Linpack benchmark code should not be compared with those from High Performance Linpack (HPL) benchmark.

System 2/1 performance ratios varied between 1.77 and 2.15, the latter via using NEON intrinsic functions.

 System 1 Android 10

 ARM/Intel DP Linpack Benchmark 4A8 16-Jun-2021 23.59 
           Compiled for 64 bit ARM v8a               
                                                     System 2/1
 Speed             1121.87 MFLOPS                    Comparison

 norm. resid                 1.7
 resid            7.41628980e-14
 machep           2.22044605e-16
 x[0]-1          -1.49880108e-14
 x[n-1]-1        -1.89848137e-14

 System 2 Android 11

 ARM/Intel DP Linpack Benchmark 4A8 24-Jul-2021 21.00
           Compiled for 64 bit ARM v8a

 Speed             1985.71 MFLOPS                         1.77

 norm. resid                 1.7                       1.00000
 resid            7.41628980e-14                       1.00000
 machep           2.22044605e-16                       1.00000
 x[0]-1          -1.49880108e-14                       1.00000
 x[n-1]-1        -1.89848137e-14                       1.00000

 System 1 Android 10

 ARM/Intel SP Linpack Benchmark 4A8 17-Jun-2021 00.00
           Compiled for 64 bit ARM v8a

 Speed             1116.97 MFLOPS

 norm. resid                 1.6
 resid            3.80277634e-05
 machep           1.19209290e-07
 x[0]-1          -1.38282776e-05
 x[n-1]-1        -7.51018524e-06

 System 2 Android 11

 ARM/Intel SP Linpack Benchmark 4A8 24-Jul-2021 21.01
           Compiled for 64 bit ARM v8a

 Speed             2144.87 MFLOPS                         1.92

 norm. resid                 1.6                       1.00000
 resid            3.80277634e-05                       1.00000
 machep           1.19209290e-07                       1.00000
 x[0]-1          -1.38282776e-05                       1.00000
 x[n-1]-1        -7.51018524e-06                       1.00000

 System 1 Android 10

ARM NEON Linpack Benchmark 4A8 22-Jul-2021 22.17
           Compiled for 64 bit ARM v8a

 Speed             2146.12 MFLOPS

 norm. resid                 1.6
 resid            3.80277634e-05
 machep           1.19209290e-07
 x[0]-1          -1.38282776e-05
 x[n-1]-1        -7.51018524e-06

 System 2 Android 11

 ARM NEON Linpack Benchmark 4A8 24-Jul-2021 21.24
           Compiled for 64 bit ARM v8a

 Speed             4620.54 MFLOPS                         2.15

 norm. resid                 1.6                       1.00000
 resid            3.80277634e-05                       1.00000
 machep           1.19209290e-07                       1.00000
 x[0]-1          -1.38282776e-05                       1.00000
 x[n-1]-1        -7.51018524e-06                       1.00000 

Livermore Loops Benchmark below or Go To Start


Livermore Loops Benchmark - LivermoreLoops2.apk

The Livermore Loops comprise 24 kernels of numerical applications with speeds calculated in MFLOPS (double precision). A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. They are repeated three times at different array dimension spans.

Below are MFLOPS scores for the 24 kernels, at one data span, and overall ratings of Maximum, Average, Geometric mean, Harmonic mean and Minimum MFLOPS. System 2 improvements for the 24 loops were between 1.54 and 2.38 times and official (Geometric) average 1.87 times..

This was the benchmark used to evaluate relative performance of the first supercomputers at Livermore Laboratory, where the Cray 1 was purchased for $7 Million in 1978. Then, the 24 loops geometric mean speed was 11.9 MFLOPS. The 2021 some $200 System 2 phone was 123 times faster. Also, the Cray 1 weighed 10,500 pounds and had a 115 kilowatt power supply.

 System 1 Android 10

 ARM/Intel Livermore Loops Benchmark 4A8 22-Jul-2021 22.42
           Compiled for 64 bit ARM v8a

  MFLOPS for 24 loops Do Span 471
  1410.7   899.2   878.3   869.0   494.6   711.3
  1655.2  1816.5  1713.6   845.2   495.8  1030.7
   274.5   466.7   658.8   776.5   931.6  1261.7
   455.5   796.2   947.2   742.1   894.9   374.9

 Overall Weighted MFLOPS Do Spans 471, 90, 19
 Maximum Average Geomean Harmean Minimum
  1816.5   877.3   786.1   699.6   269.2

 Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

 Total Elapsed Time    9.1 seconds

 System 2 Android 11

 ARM/Intel Livermore Loops Benchmark 4A8 24-Jul-2021 21.02
           Compiled for 64 bit ARM v8a

  MFLOPS for 24 loops Do Span 471
  2577.1  1851.7  1597.1  1633.0   773.7  1402.3
  2552.8  2943.6  2725.4  1858.0   962.5  2080.3
   513.6   740.7  1355.8  1525.6  1484.4  2586.3
   699.1  1891.1  1733.1  1288.2  1517.8   658.1

 Overall Weighted MFLOPS Do Spans 471, 90, 19
 Maximum Average Geomean Harmean Minimum
  2943.6  1620.0  1467.8  1310.5   513.6

 Results of last two calculations
   4.850340602749970e+02  1.300000000000000e+01

 Total Elapsed Time    8.8 seconds

 System 2/1 Comparison

  MFLOPS for 24 loops Do Span 471
    1.83    2.06    1.82    1.88    1.56    1.97
    1.54    1.62    1.59    2.20    1.94    2.02
    1.87    1.59    2.06    1.96    1.59    2.05
    1.63    2.38    1.83    1.74    1.70    1.76

 Maximum Average Geomean Harmean Minimum
    1.62    1.85    1.87    1.87    1.91

 Results of last two calculations were identical
  

MemSpeed next or Go To Start


MemSpeed Benchmark - MemSpeedi.apk

This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision (DP and SP) floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing DP MB/second by 8 and 16, for the two tests, and SP speeds by 4 and 8.

The System 2/1 comparisons indicate all round gains for System 2, lowest for RAM based data and best from L3/L2 shared caches.

 System 1 Android 10

 ARM/Intel MemSpeed Benchmark 4A8 19-Jun-2021 10.15
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16  11458   9732  10965  12118   8065   8177  L1
      32  11497   9750  10976  12078   8050   8149
      64  11449   9724  10927  12216   8065   8184
     128   9730   8731   9266   9712   7230   7280  L2
     256   9308   8964   9247   9082   7312   7438
     512   9292   8985   9277   9244   7441   7488
    1024   8375   8098   8341   8394   6877   6896
    4096   6333   6268   6304   6302   6051   6085  RAM
   16384   6242   6235   6196   6261   6057   5969
   65536   6345   6270   6303   6304   6059   6157

          Total Elapsed Time    9.5 seconds
 Max
 MFLOPS    1437   2438

 System 2 Android 11

 ARM/Intel MemSpeed Benchmark 4A8 24-Jul-2021 22.23
           Compiled for 64 bit ARM v8a

              Reading Speed in MBytes/Second
  Memory  x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]
  KBytes   Dble   Sngl    Int   Dble   Sngl    Int
      16  14069  12493  13326  26051  13150  12846 L1
      32  14074  12343  13339  26128  13038  12846
      64  14057  12178  13329  25900  12540  12700
     128  13456  11975  13002  21384  12183  12453 L2
     256  13209  11801  12676  20306  12026  12150
     512  13118  11803  12616  20400  12001  12151
    1024  13221  11859  12751  20865  12074  12232 L3 2 MB
    4096  10233  10003  10096   9517   9830   8972 RAM
   16384   8208   8429   8312   7815   8001   7543
   65536   7912   7935   7918   7442   7665   7400

          Total Elapsed Time   11.7 seconds
 Max
 MFLOPS   1759    3123 

 System 2/1 Comparison

 KBytes   Dble   Sngl    Int   Dble   Sngl    Int     Average
      16   1.23   1.28   1.22   2.15   1.63   1.57       1.51
      32   1.22   1.27   1.22   2.16   1.62   1.58       1.51
      64   1.23   1.25   1.22   2.12   1.55   1.55       1.49
     128   1.38   1.37   1.40   2.20   1.69   1.71       1.63
     256   1.42   1.32   1.37   2.24   1.64   1.63       1.60
     512   1.41   1.31   1.36   2.21   1.61   1.62       1.59
    1024   1.58   1.46   1.53   2.49   1.76   1.77       1.76
    4096   1.62   1.60   1.60   1.51   1.62   1.47       1.57
   16384   1.31   1.35   1.34   1.25   1.32   1.26       1.31
   65536   1.25   1.27   1.26   1.18   1.27   1.20       1.24
  

NeonSpd Benchmark next or Go To Start


NeonSpeed Benchmark - NeonSpeedi.apk

This benchmark carries out the same calculations as the MemSpeed Benchmark, except they are all in single precision, for comparison with NEON sections. The latter are carried out using NEON intrinsic functions.

System 2 is indicated as being faster on all tests between 1.11 and 9.24 times, best being using SIMD NEON instructions, where maximum Single Precision MFLOPS was the highest recorded by me so far. Based on experience on Intel processors and 128 bit registers, containing four 32 bit words, maximum possible speed, with 2 GHz clock, could be 8 GFLOPS or 16 GFLOPS with fused (or linked) multiply and add. The 9.69 GFLOPS, shown below, indicates some involvement in fusing. A later example recorded 12.8 GFLOPS using 32 floating point operations per data word read.

 System 1 Android 10

 ARM NeonSpeed Benchmark 4A8 22-Jul-2021 22.18
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16   9241  14779  10559  13864  12703  13179 L1
      32   9283  14825  11015   7090   5829   6142
      64   4227   6772   4989   6313   9442  13709
     128   8515   9179   9198   8787  10010  10032 L2
     256   8772   8545   9332   9425   9399   9408
     512   8665   8437   9302   9342   9316   9337
    1024   7657   7105   8002   8129   8075   8105
    4096   6113   6126   6189   6133   6239   6234 RAM
   16384   6084   6123   6167   6099   6159   6158
   65536   6361   6225   6386   5940   6416   6468

          Total Elapsed Time    9.5 seconds
 Max
 MFLOPS    2321   3706 

 System 2 Android 11

 ARM NeonSpeed Benchmark 4A8 24-Jul-2021 22.28
           Compiled for 64 bit ARM v8a

       Vector Reading Speed in MBytes/Second
  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v
  KBytes   Norm   Neon   Norm   Neon  Float    Int
      16  12793  38764  13458  42313  53790  53849 L1
      32  12798  38713  13463  42599  53877  53788 
      64  12784  38353  13452  42230  44110  44309
     128  12608  28514  13361  28752  28891  28856 L2
     256  12394  27811  13169  27813  27915  27954
     512  12459  27508  13224  27615  27969  27972
    1024  12457  25436  13155  25224  25289  25075 L3 2 MB
    4096  10213   7808  10226   9124   9539   9401 RAM
   16384   8161   7470   8209   7302   7569   7575
   65536   7850   7269   7782   6699   7245   7208

          Total Elapsed Time   10.4 seconds
 Max
 MFLOPS    3200   9691

 System 2/1 Comparison

  KBytes   Norm   Neon   Norm   Neon  Float    Int    Average
      16   1.38   2.62   1.27   3.05   4.23   4.09       2.78
      32   1.38   2.61   1.22   6.01   9.24   8.76       4.87
      64   3.02   5.66   2.70   6.69   4.67   3.23       4.33
     128   1.48   3.11   1.45   3.27   2.89   2.88       2.51
     256   1.41   3.25   1.41   2.95   2.97   2.97       2.50
     512   1.44   3.26   1.42   2.96   3.00   3.00       2.51
    1024   1.63   3.58   1.64   3.10   3.13   3.09       2.70
    4096   1.67   1.27   1.65   1.49   1.53   1.51       1.52
   16384   1.34   1.22   1.33   1.20   1.23   1.23       1.26
   65536   1.23   1.17   1.22   1.13   1.13   1.11       1.17
   

BusSpeed Benchmark next or Go To Start


BusSpeed Benchmark - BusSpeedv7i.apk

This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum bus speed can be estimated by multiplying the Int16 value by 16. Then, for each half reduction in increments, a near doubling of MB/second could be expected. This is not the case here, between 2 word and 1 word increments, with System 2 being the worst. However, see MP-BusSpeed results, suggesting that access by multiple cores is necessary to obtain maximum memory throughput.

Data from caches can also increase on reducing addressing increments, suggesting burst reading, with System 2 indicating improved performance. Normally, only read all comparisons are calculated. In this case, System 2 is indicated as being slower from L2 to L3 caches, due to those address increment complications. The benchmark probably requires longer running times for greater accuracy.
 
 System 1 Android 10

 ARM/Intel BusSpeed Benchmark 4A8 22-Jul-2021 21.51
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All L1
      16   6064   6524   7171   7538   7557   7891
      32   5126   5101   5123   5375   3670   3608
      64   1970   2149   2914   3612   4857   4712 
     128    546    551   1212   2676   4973   7888 L2
     256   1009   1010   2178   3450   5325   7909
     512   1007   1005   2160   3540   5231   7728
    1024    513    574   1694   2154   3976   7124
    4096    581    606   1461   2784   4981   7561 RAM
   16384    580    614   1430   2793   4917   7557
   65536    612    642   1375   2712   4851   7482

          Total Elapsed Time    5.0 seconds

 Max Bus Speed? 642 x 16 = 10272 MB/second

 System 2 Android 11

 ARM/Intel BusSpeed Benchmark 4A8 24-Jul-2021 22.26
           Compiled for 64 bit ARM v8a

    Reading Speed 4 Byte Words in MBytes/Second
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read
  KBytes  Words  Words  Words  Words  Words    All
      16   6850   6978   7653   7907   7947   7948 L1
      32   7559   7652   7675   7948   7958   7948
      64   6229   6271   7799   7932   7950   7943
     128   1863   3264   5571   7823   7911   7946 L2
     256   1414   2208   4398   7319   7883   7937
     512   1006   1839   3650   7219   6000   7056
    1024    919   1587   3093   5806   7779   6552 L3 2 MB
    4096    645   1059   2165   4078   7285   7867 RAM
   16384    583    875   1825   3688   7089   7874
   65536    569    852   1750   3585   6938   7881

          Total Elapsed Time    5.3 seconds

 Max Bus Speed? 875 x 16 = 14000 MB/second
   
 System 2/1 Comparison

  KBytes  Words  Words  Words  Words  Words    All    Average
      16   1.13   1.07   1.07   1.05   1.05   1.01       1.06
      32   1.47   1.50   1.50   1.48   2.17   2.20       1.72
      64   3.16   2.92   2.68   2.20   1.64   1.69       2.38
     128   3.41   5.92   4.60   2.92   1.59   1.01       3.24
     256   1.40   2.19   2.02   2.12   1.48   1.00       1.70
     512   1.00   1.83   1.69   2.04   1.15   0.91       1.44
    1024   1.79   2.76   1.83   2.70   1.96   0.92       1.99
    4096   1.11   1.75   1.48   1.46   1.46   1.04       1.38
   16384   1.01   1.43   1.28   1.32   1.44   1.04       1.25
   65536   0.93   1.33   1.27   1.32   1.43   1.05       1.22
  

RandMem Benchmark next or Go To Start


RandMem Benchmark - RandMemi.apk

RandMem benchmark carries out four tests comprising serial and random address selections using the same program structure, with read and read/write tests, where the data read points to the next address, with no arithmetic calculations. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches.

System 2 was clearly faster on most tests, best cases affected by caching influence. The exceptions were due to the strange behaviour with serial reading at 512 and 1024 KB. The benchmark was repeated twice on System 2, confirming the same problem.

 System 1 Android 10

 ARM/Intel RandMem Benchmark 4A8 22-Jul-2021 22.19
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16     8911     7308     8913     7281 L1
       32     8939     7321     8941     7310
       64     8925     7308     8904     7305
      128     9024     7305     3297     3359 L2
      256     9180     7313     2282     2408
      512     9222     7170     1946     2087
     1024     6969     6039      509      679
     4096     8450     4821      165      194 RAM
    16384     8463     4850      138      160
    65536     8453     4865      133      156

          Total Elapsed Time    8.9 seconds

 System 2/1 Comparison

  ARM/Intel RandMem Benchmark 4A8 24-Jul-2021 22.32
           Compiled for 64 bit ARM v8a

    MBytes/Second Transferring 4 Byte Words  
   Memory     Serial.......     Random.......
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16    14412    15286    14038    13428 L1
       32    14464    15290    14042    13427
       64    14544    15199    13983    13342
      128    12595    13193     8580     7691 L2
      256    12522    13113     4874     4973
      512     9747    11936     1962     2467
     1024     9016    12918     1230     1566 L3 2 MB
     4096    12135     6933      527      530 RAM
    16384    11805     5987      408      411
    65536    12029     5626      386      382

          Total Elapsed Time    8.4 seconds

 System 2/1 Comparison

   Memory   Serial  .......   Random  .......    Average
   KBytes     Read   Rd/Wrt     Read   Rd/Wrt
       16     1.62     2.09     1.58     1.84       1.78
       32     1.62     2.09     1.57     1.84       1.78
       64     1.63     2.08     1.57     1.83       1.78
      128     1.40     1.81     2.60     2.29       2.02
      256     1.36     1.79     2.14     2.07       1.84
      512     1.06     1.66     1.01     1.18       1.23
     1024     1.29     2.14     2.42     2.31       2.04
     4096     1.44     1.44     3.19     2.73       2.20
    16384     1.39     1.23     2.96     2.57       2.04
    65536     1.42     1.16     2.90     2.45       1.98
 

FFT Benchmarks next or Go To Start


FFT Benchmarks - fft1.apk, fft3c.apk

The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results provided are running times in milliseconds. Besides Android, the bechmarks are available to run via Windows and Linux. Two versions are available FFT1, original version and with optimised C code as FFT3c.

Memory used increases with FFT sizes, up to use from RAM and is often accessed on a skipped sequential basis, leading to burst reading effects, like in RandMem random access tests. Again, there were all round System 2 performance improvements, this time between 1.26 and 2.90 for the original benchmark and 1.22 and 2.43 for the optimised one.

 System 1 Android 10

 ARM/Intel FFT Benchmark 1 4A8 22-Jul-2021 21.56
           Compiled for 64 bit ARM v8a

 Size                     milliseconds                             Average
    K    Single Precision           Double Precision              SP       DP
    1    0.049    0.046    0.045    0.041    0.041    0.041    0.047    0.041
    2    0.098    0.098    0.097    0.088    0.087    0.087    0.098    0.087
    4    0.210    0.229    0.211    0.199    0.197    0.197    0.217    0.198
    8    0.472    0.469    0.470    0.577    0.575    0.577    0.470    0.576
   16    1.264    1.259    1.262    1.341    1.354    1.341    1.262    1.345
   32    2.884    2.856    2.868    3.044    3.016    3.002    2.869    3.021
   64    6.333    6.370    6.332   12.711   12.730   12.613    6.345   12.685
  128   23.624   23.341   23.459   58.103   80.489   72.876   23.475   70.489
  256  135.025  130.145  126.713  177.673  190.005  181.423  130.628  183.034
  512  332.983  334.722  340.869  458.987  423.731  439.251  336.191  440.656
 1024  826.862  830.598  769.849 1027.110  988.710  981.118  809.103  998.979

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time   10.4 seconds

 System 2 Android 11 and Comparisons

 ARM/Intel FFT Benchmark 1 4A8 24-Jul-2021 22.30
           Compiled for 64 bit ARM v8a
                                                                                   Benchmark 1
 Size                     milliseconds                              Average     Compare Sys 2/1
    K    Single Precision           Double Precision              SP       DP       SP       DP
    1    0.032    0.029    0.028    0.029    0.028    0.028    0.030    0.028     1.57     1.45
    2    0.061    0.060    0.060    0.061    0.060    0.060    0.060    0.060     1.62     1.45
    4    0.130    0.129    0.129    0.134    0.133    0.133    0.129    0.133     1.68     1.48
    8    0.283    0.433    0.331    0.460    0.453    0.454    0.349    0.456     1.35     1.26
   16    0.942    0.942    0.939    1.114    1.150    0.903    0.941    1.056     1.34     1.27
   32    1.840    1.615    1.599    2.190    2.180    2.090    1.685    2.153     1.70     1.40
   64    4.416    4.323    4.290    5.636    5.453    5.562    4.343    5.550     1.46     2.29
  128   14.552   11.566   11.381   29.090   26.732   24.610   12.500   26.811     1.88     2.63
  256   45.709   44.106   45.251   63.553   64.000   64.008   45.022   63.854     2.90     2.87
  512  117.698  117.816  117.043  151.497  153.442  151.532  117.519  152.157     2.86     2.90
 1024  301.501  290.818  289.458  347.153  345.639  345.158  293.926  345.983     2.75     2.89

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28
        System 2/1
        SP   1.0000000000  1.0000000000  1.0000000000
        DP   1.0000000000  1.0000000000  1.0000000000

       Total Elapsed Time    4.1 seconds

   

Second FFT Benchmark Results below


FFT fft3c.apk Results


 System 1 Android 10
 
 ARM/Intel FFT Benchmark 3c 4A8 22-Jul-2021 21.57
           Compiled for 64 bit ARM v8a

 Size                     milliseconds                             Average
    K    Single Precision           Double Precision              SP       DP
    1    0.065    0.054    0.054    0.056    0.050    0.050    0.058    0.052
    2    0.120    0.114    0.114    0.105    0.106    0.106    0.116    0.106
    4    0.256    0.244    0.245    0.234    0.236    0.235    0.248    0.235
    8    0.558    0.537    0.537    0.559    0.565    0.561    0.544    0.562
   16    1.251    1.224    1.218    1.380    1.284    1.276    1.231    1.313
   32    2.676    2.612    2.628    3.406    3.255    3.326    2.639    3.329
   64    5.965    5.911    5.929    9.742    9.653    9.911    5.935    9.769
  128   15.622   15.281   15.210   24.284   24.313   24.484   15.371   24.360
  256   36.663   35.968   35.950   57.401   55.287   52.135   36.194   54.941
  512   81.739  100.683  101.756  127.262  127.765  127.512   94.726  127.513
 1024  222.161  221.980  218.134  313.828  306.046  304.332  220.758  308.069

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

       Total Elapsed Time    4.0 seconds

 System 2 Android 11 and Comparisons

 ARM/Intel FFT Benchmark 3c 4A8 24-Jul-2021 22.31
           Compiled for 64 bit  ARM v8a
                                                                                  Benchmark 2
 Size                     milliseconds                              Average     Compare Sys 2/1
    K    Single Precision           Double Precision              SP       DP       SP       DP
    1    0.053    0.044    0.042    0.030    0.029    0.028    0.046    0.029     1.24     1.79
    2    0.098    0.090    0.090    0.061    0.061    0.060    0.093    0.061     1.25     1.74
    4    0.207    0.195    0.198    0.132    0.168    0.152    0.200    0.151     1.24     1.56
    8    0.451    0.425    0.458    0.324    0.323    0.323    0.445    0.323     1.22     1.74
   16    1.014    0.975    0.973    0.825    0.773    0.770    0.987    0.789     1.25     1.66
   32    1.578    1.542    1.485    1.774    1.764    1.735    1.535    1.758     1.72     1.89
   64    3.426    3.446    3.324    4.319    4.329    4.383    3.399    4.344     1.75     2.25
  128   10.467    8.002    7.957   10.141    9.951   10.000    8.809   10.031     1.74     2.43
  256   20.145   19.834   19.305   24.855   26.867   25.645   19.761   25.789     1.83     2.13
  512   44.796   43.895   42.502   60.550   60.966   60.162   43.731   60.559     2.17     2.11
 1024   98.944   96.987   96.128  142.164  141.496  139.192   97.353  140.951     2.27     2.19

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28
        System 2/1
        SP   1.0000000000  1.0000000000  1.0000000000
        DP   1.0000000000  1.0000000000  1.0000000000

       Total Elapsed Time    2.1 seconds
   
MP-Whetstone Benchmark next or Go To Start


MP-Whetstone Benchmark - MP-WHETSi.apk

For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same shared code, with separate variables.

Before comparing results, it should be noted that the high Fixpt MOPS are impossible to achieve, where the compiler has found that some of the code can be ignored without changing he calculated result. However, the time for this function has little effect on overall MWIPS rating.

With mixed MHz CPU cores and big.LITTLE architectures, it is more difficult to predict performance using multithreaded benchmarks. Using 8 identical cores, performance would normally nearly double using twice as many cores. Then, this applied to System 1, using Cortex A73 and A53, both at 2.0 GHz. This might be because their architectures are similar in executing the simple Whetstone benchmark test functions.

System 2, with its two fast Kryo 480 CPUs and six slower Kryo 460 ones, with clearly less advanced architecture, lead to overall System 2/1 MWIPS comparison reducing from 1.61 at 2 threads, to 1.28 at 4, then 1.04 at 8.

Samples of my MHz monitor results are provided below, whilst running the benchmark on System 2 (slightly higher than initial specification obtained). These appear to show that appropriate frequencies were used in all cases.

 System 1 Android 10

    ARM/Intel MP-Whetstone Benchmark 4A8 22-Jul-2021 22.16
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads
     MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp    Fixpt      If  Equal
                1      2      3  MOPS  MOPS     MOPS    MOPS   MOPS

1T  2844.5  572.2  549.8  488.6  90.9  46.4  32202.9  2645.1  504.4
2T  5473.0 1057.2  995.7  959.1 173.5  88.8  77646.5  4953.9  990.5
4T 11006.8 2172.5 2077.1 1937.1 344.9 178.3 117441.2 10128.7 1986.3
8T 20297.4 4233.3 4031.0 3758.6 608.0 334.9 330393.0 22268.2 3515.9

 Overall Seconds   4.76 1T,   5.05 2T,   5.06 4T,   6.11 8T

 All calculations produced consistent numeric results

         Total Elapsed Time   22.2 seconds

 System 2 Android 11
 
    ARM/Intel MP-Whetstone Benchmark 4A8 08-Aug-2021 16.43
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads
     MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp    Fixpt      If  Equal
                1      2      3  MOPS  MOPS     MOPS    MOPS   MOPS

1T  4326.6 1010.2  984.1  781.9 135.0  67.5  19781.8  2975.5  746.6
2T  8782.2 1850.3 2125.6 1603.9 270.4 133.8 103019.0  5978.0 1505.0
4T 13968.6 3189.1 3372.5 2641.2 438.4 233.3 148677.8 10556.0 2473.3
8T 21038.9 4535.4 4984.9 4171.4 525.4 385.8 353966.8 20385.7 3457.6

 Overall Seconds   4.57 1T,   4.54 2T,   6.91 4T,   7.86 8T

 All calculations produced consistent numeric results

          Total Elapsed Time   24.8 seconds

 System 2/1 Comparison

     MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp    Fixpt      If  Equal

1T    1.47   1.60   1.41   1.55  1.47  1.43     1.45    0.96   1.47
2T    1.61   1.85   2.12   1.67  1.51  1.59     1.21    1.21   1.52
4T    1.28   1.38   1.59   1.40  1.24  1.34     1.08    1.07   1.24
8T    1.04   1.11   1.22   1.13  0.87  1.15     1.07    0.91   0.99

 Sample CPU MHz Measurements

  Core     0     1     2     3     4     5     6     7        
  Secs
     1  1709  1478  1805  1805  1478  1709  2035  2035 1 Core
     2  1805  1709  1805  1805  1478  1805  2035  2035

     5  1805  1709  1805  1805  1325  1805  2035  2035 2 Cores
     6  1805  1805  1805  1478  1805  1709  2035  2035

    11  1805  1805  1805  1709  1805  1805  2035  2035 4 Cores
    12  1805  1478  1478  1478  1709  1805  2035  2035

    16  1805  1805  1805  1805  1805  1805  2035  2035 8 Cores
    17  1805  1805  1805  1805  1805  1805  2035  2035
  

MP-Dhrystone Benchmark next or Go To Start


MP Dhrystone Benchmark - MP-Dhryi.apk

This benchmark does not provide reasonable increases in measured performance using multiple cores, probably because many of the variables used are shared by all threads. Results using one thread are only slightly slower than from the single core version, indicating that threading overheads were not excessive. The lack of improvement using multiple cores probably invalidates comparisons of the two systems.

 System 1 Android 10

 ARM/Intel MP-Dhrystone 2 Benchmark 4A8 22-Jul-2021 22.13
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads

 Threads                        1        2        4        8
 Seconds                     0.53     0.94     1.56     6.52
 Dhrystones per Second   14969439 16968330 20569253  9817217
 VAX MIPS rating             8520     9658    11707     5587

 Internal pass count correct all threads

          Total Elapsed Time   10.0 seconds

 System 2 Android 11
 
 ARM/Intel MP-Dhrystone 2 Benchmark 4A8 27-Jul-2021 21.04
           Compiled for 64 bit ARM v8a

                   Using 1, 2, 4 and 8 Threads

 Threads                        1        2        4        8
 Seconds                     0.68     1.75     3.90    14.16
 Dhrystones per Second   23379531 18244401 16418508  9040403
 VAX MIPS rating            13307    10384     9345     5145


 Internal pass count correct all threads

          Total Elapsed Time   21.2 seconds

System 2/1 Comparison

 Threads                        1        2        4        8
 VAX MIPS rating             1.56     1.08     0.80     0.92
  



NEON-Linpack-MP Benchmark - NEON-Linpacki-MP.apk

This is a multithreading version of the above. Further details and results can be found in android neon benchmarks.htm 2013. and 2017 Android Report

This benchmark is not generally available with the new 4A8 compilation as overall running time had increased to more than 400 seconds, on a new phone.

MP-BusSpeed Benchmark next or Go To Start


MP-BusSpeed Benchmark - MP-BusSpd2i.apk

This is a multithreading version of BusSpeed above, except, as for other memory benchmarks, restricted to three memory size demands that were originally representative of using L1 cache, L2 cache and RAM data. To avoid caching effects of RAM based data, this version arranges for threads to have staggered starting points, each reading all the data.

The first comparisons provided for each system are for reading all data, demonstrating changes in throughput on doubling the number of CPU cores used. At 49152 KB, RAM and bus throughput can be the limiting factor, and this can be constant on using more cores. At 12.3 KB,all cores should be accessing L1 cache based data, when variations in the speed of different cores can be significant. The latter can also influence tests at 122.9 KB, but with performance gains provided from L1 caches due to the repetitive reading by more cores.

Below are full comparisons of all measurements and an average of each row that is generally representative of all entries. The benchmark uses streamed AND functions, where performance is probably proportional to CPU MHz on the two systems, as demonstrated using one and two threads and L1 cache. Then, at 4 and 8 threads, the slow System 2 cores lead to System 1 being faster.

System 2 was mainly much faster using L2 and L3 caches, but not quite so from RAM to start with, until the slower System 2 cores came into play.

 System 1 Android 10

 ARM/Intel MP-BusSpd2 Benchmark 4A8 22-Jul-2021 22.12
           Compiled for 64 bit ARM v8a

   MB/Second Reading Data, 1, 2, 4 and 8 Threads     RdAll
   KB     Inc32  Inc16   Inc8   Inc4   Inc2  RdAll    Gain
 12.3 1T   6579   6925   7179   7392   7433   7698
      2T  10723  12045  12847  13687  13935  12496    1.62
      4T  18313  21611  24525  26576  27695  24033    1.92
      8T  16805  20127  41695  37245  51275  37240    1.55
122.9 1T   1282   1261   2450   3990   5987   7665
      2T   1660   1634   3201   5542   8427  11388    1.49
      4T   1901   1972   4028   7803  14406  22338    1.96
      8T   2917   3020   6161  12616  25020  32368    1.45
49152 1T    562    573   1347   2646   4712   7303
      2T    616    641   1383   2759   5263   9432    1.29
      4T    845    992   1387   2816   5609  10698    1.13
      8T    947    914   1854   4119   7329  13010    1.22
 No Errors Found
          Total Elapsed Time   55.1 seconds

System 2 Android 11

 ARM/Intel MP-BusSpd2 Benchmark 4A8 27-Jul-2021 20.49
           Compiled for 64 bit ARM v8a

   MB/Second Reading Data, 1, 2, 4 and 8 Threads      RdAll
   KB      Inc32  Inc16   Inc8   Inc4   Inc2  RdAll    Gain
 12.3 1T    7138   7264   7583   7609   7683   7588
      2T    8623  12192  13891  14498  15041  14914    1.97
      4T    8020  11035  15436  18877  22120  19132    1.28
      8T   12476  15114  28710  25108  37940  27187    1.42
122.9 1T    1857   3441   6018   7969   7287   7211
      2T    3918   7120  11024  14414  15691  15856    2.20
      4T    4740   7401  12315  17656  20651  18955    1.20
      8T    4848   8516  15255  25611  37474  33515    1.77
49152 1T     559    792   1757   3208   6009   7219
      2T     752   1120   2054   3630   7162  14022    1.94
      4T     769    942   1737   3423   7200  14738    1.05
      8T     697    905   1771   3668   7318  14452    0.98
 No Errors Found
          Total Elapsed Time   55.0 seconds

System 2/1 Comparison

   KB     Inc32  Inc16   Inc8   Inc4   Inc2  RdAll Average
 12.3 1T   1.08   1.05   1.06   1.03   1.03   0.99    1.04
      2T   0.80   1.01   1.08   1.06   1.08   1.19    1.04
      4T   0.44   0.51   0.63   0.71   0.80   0.80    0.65
      8T   0.74   0.75   0.69   0.67   0.74   0.73    0.72
122.9 1T   1.45   2.73   2.46   2.00   1.22   0.94    1.80
      2T   2.36   4.36   3.44   2.60   1.86   1.39    2.67
      4T   2.49   3.75   3.06   2.26   1.43   0.85    2.31
      8T   1.66   2.82   2.48   2.03   1.50   1.04    1.92
49152 1T   0.99   1.38   1.30   1.21   1.28   0.99    1.19
      2T   1.22   1.75   1.49   1.32   1.36   1.49    1.44
      4T   0.91   0.95   1.25   1.22   1.28   1.38    1.16
      8T   0.74   0.99   0.96   0.89   1.00   1.11    0.95

  

MP-RandMem Benchmark next or Go To Start


MP-RandMem Benchmark - MP-RndMemi.apk

This is a multithreading version of RandMem above. The most striking feature of these MP results is the apparent constant or near performance at all thread sizes during read/write tests, over the memory area covered. This is probably because write back involves accessing RAM.

The System 2/1 performance comparisons were between 0.74 and 5.41, with the widest variations on using four or eight threads. System 2 was clearly the winner using one or two threads, with read/write and at 122.9 KB data size. Then System 1 was the best on read only tests at four and eight threads.

System 1 Android 10

 ARM/Intel MP-RndMem Benchmark 4A8 22-Jul-2021 22.14
           Compiled for 64 bit ARM v8a

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR
12.29 1T    9927    6127    9763    7586
      2T   16033    5560   16875    5334
      4T   33574    4718   32230    4201
      8T   51899    3553   35118    3749
122.9 1T    9119    7515    3479    3644
      2T   12483    5231    4191    2194
      4T   20634    4189    5287    1392
      8T   36374    3333    7513    1645
12288 1T    8168    4727     227     178
      2T    9980    3464     403     172
      4T   11411    2540     632     108
      8T   18753    1693     848      86
 No Errors Found
          Total Elapsed Time   48.7 seconds

System 2 Android 11

 ARM/Intel MP-RndMem Benchmark 4A8 27-Jul-2021 20.51
           Compiled for 64 bit ARM v8a

  MB/Second Using 1, 2, 4 and 8 Threads
  KB       SerRD SerRDWR   RndRD RndRDWR
12.29 1T   14856   15879   14847   13791
      2T   28557   15185   27599   14283
      4T   29740   15233   29814   13809
      8T   43914   14639   33087    8970
122.9 1T   12174   12422    8374    7495
      2T   24468   12783   17755    7664
      4T   30157   12649   17826    7525
      8T   45480    8194   21182    4668
12288 1T   11517    5872     439     432
      2T   14210    5893     472     401
      4T   16404    5852     505     429
      8T   17490    4001     631     395
 No Errors Found
          Total Elapsed Time   46.7 seconds

System 2/1 Comparison

  KB       SerRD SerRDWR   RndRD RndRDWR
 12.3 1T    1.50    2.59    1.52    1.82
      2T    1.78    2.73    1.64    2.68
      4T    0.89    3.23    0.93    3.29
      8T    0.85    4.12    0.94    2.39
122.9 1T    1.34    1.65    2.41    2.06
      2T    1.96    2.44    4.24    3.49
      4T    1.46    3.02    3.37    5.41
      8T    1.25    2.46    2.82    2.84
12288 1T    1.41    1.24    1.93    2.43
      2T    1.42    1.70    1.17    2.33
      4T    1.44    2.30    0.80    3.97
      8T    0.93    2.36    0.74    4.59

  

MP-MFLOPS Benchmark next or Go To Start


MP-MFLOPS Benchmark - MP-MFLOPS2i.apk

The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run.

Based on Intel SIMD performance, with 128 bit registers and linked (fused) multiply and add, up to eight single precision floating point operations could be expected per clock cycle, or 16 GFLOPS per core at 2 GHz. At least System 2 approached that at 12.2 and 23.5 GFLOPS, at one and two threads, around twice as fast as System 1. This also demonstrates that SIMD instructions were generated by the compiler.

The SIMD implementation also provided a System 2 performance advantage at four and eight threads,

 System 1 Android 10

 ARM/Intel MP-MFLOPS2 Benchmark 4A8 22-Jul-2021 22.11
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     3912    3669    2315    6414    6367    6361
 2T     3120    3397    2273   11170   11385   11459
 4T     5301    7184    2430   21738   20485   20129
 8T     8789   11002    2416   29145   29936   28507
 Results x 100000, 0 indicates ERRORS
 1T    40392   76406   99700   35218   66014   99520
 2T    40392   76406   99700   35218   66014   99520
 4T    40392   76406   99700   35218   66014   99520
 8T    40392   76406   99700   35218   66014   99520

          Total Elapsed Time   12.4 seconds

 System 2 Android 11

 ARM/Intel MP-MFLOPS2 Benchmark 4A8 27-Jul-2021 20.53
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     6977    8034    2984   12178   12139   12137
 2T    10759   10573    2814   23032   23509   23674
 4T    11813   11973    2671   26022   26173   25387
 8T    15998   14536    2442   34803   35686   34050
 Results x 100000, 0 indicates ERRORS
 1T    40392   76406   99700   35218   66014   99520
 2T    40392   76406   99700   35218   66014   99520
 4T    40392   76406   99700   35218   66014   99520
 8T    40392   76406   99700   35218   66014   99520

          Total Elapsed Time    7.4 seconds

 System 2/1 Comparison

 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1.78    2.19    1.29    1.90    1.91    1.91
 2T     3.45    3.11    1.24    2.06    2.06    2.07
 4T     2.23    1.67    1.10    1.20    1.28    1.26
 8T     1.82    1.32    1.01    1.19    1.19    1.19
 Results Comparison
 1T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
 2T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
 4T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
 8T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000

  

NEON-MFLOPS-MP Benchmark next or Go To Start


NEON-MFLOPS-MP Benchmark - NEON-MFLOPS2i-MP.apk

This benchmark carries out the same calculations as MP-MFLOPS but uses hand coded NEON Intrinsic Functions. Measured maximum performance was essentially the same with System 2 faster at all thread size variations.

 System 1 Android 10

 ARM NEON-MFLOPS2-MP Benchmark 4A8 22-Jul-2021 22.17
           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     3683    3507    2275    6485    6405    6301
 2T     1845    2692    1564    9914   10164   10025
 4T     3542    4042    2308   16358   16976   16623
 8T     5953    6765    2377   22075   25944   25316
 Results x 100000, 12345 indicates ERRORS
 1T    44934   86735   99850   36770   79897   99759
 2T    44934   86735   99850   36770   79897   99759
 4T    44934   86735   99850   36770   79897   99759
 8T    44934   86735   99850   36770   79897   99759

          Total Elapsed Time    7.1 seconds

 System 2 Android 11

 ARM NEON-MFLOPS2-MP Benchmark 4A8 27-Jul-2021 20.54

           Compiled for 64 bit ARM v8a

    FPU Add & Multiply using 1, 2, 4 and 8 Threads
        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     6721    7708    2944   12811   12452   12007
 2T     6530    6343    2495   22843   23026   22570
 4T     7311    6678    2449   24362   25438   24994
 8T    11900   11942    2386   32721   32459   34292
 Results x 100000, 12345 indicates ERRORS
 1T    44934   86735   99850   36770   79897   99759
 2T    44934   86735   99850   36770   79897   99759
 4T    44934   86735   99850   36770   79897   99759
 8T    44934   86735   99850   36770   79897   99759

          Total Elapsed Time    3.9 seconds

 System 2/1 Comparison

 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T     1.82    2.20    1.29    1.98    1.94    1.91
 2T     3.54    2.36    1.60    2.30    2.27    2.25
 4T     2.06    1.65    1.06    1.49    1.50    1.50
 8T     2.00    1.77    1.00    1.48    1.25    1.35
 Results Comparison
 1T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
 2T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
 4T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
 8T  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000

  
OpenGL Benchmark next or Go To Start


OpenGL Benchmark - JavaOpenGL1.apk

Necessary for early Android devices, the benchmark does not rely on complex visual scenes or mathematical functions. The objective being to generate moderate to excessive loading via multiple simple objects. It uses all Java code, with OpenGL ES GL10 statements, to measure graphics performance in Frames Per Second (FPS). Four tests draw a background of 50 cubes first as wireframes then colour shaded. The third test views the cubes in and out of a tunnel with slotted sides and roof, also containing rotating plates. The last test adds textures to the cubes and plates. The 50 cubes are redrawn 15, 30 and 60 times, with randomised positions, colours rotational settings. With 6 x 2 triangles per cube, minimum triangles per frame for the three sets of tests are 9000, 18000 and 36000.

All tests are shown to be around twice as fast on System 2.

 System 1 Android 10

 Android Java OpenGL Benchmark 4A8 22-Jul-2021 22.00

           --------- Frames Per Second --------
 Triangles WireFrame   Shaded  Shaded+ Textured
 
   9000+      42.94    46.30    36.80    31.23
  18000+      24.60    25.91    22.64    18.73
  36000+      13.98    14.25    13.30    10.91

      Screen Pixels 720 Wide 1339 High

      Total Elapsed Time  120.6 seconds

System 2 Android 11

 Android Java OpenGL Benchmark 4A8 25-Jul-2021 08.24

           --------- Frames Per Second --------
 Triangles WireFrame   Shaded  Shaded+ Textured
 
   9000+      89.66    89.76    81.53    67.58
  18000+      56.37    56.14    48.89    39.09
  36000+      29.16    29.39    27.47    21.27

      Screen Pixels 720 Wide 1339 High

      Total Elapsed Time  120.4 seconds

 System 2/1 Comparison

 Triangles WireFrame   Shaded  Shaded+ Textured

   9000+       2.09     1.94     2.22     2.16
  18000+       2.29     2.17     2.16     2.09
  36000+       2.09     2.06     2.07     1.95
  

Java Draw Benchmark next or Go To Start


Java Drawing Benchmark - JavaDraw.apk

This all Java benchmark uses small to rather excessive simple objects to measure drawing performance, again via Frames Per Second (FPS). Five 10 second tests draw on a background of continuously changing colour shades.

  • Test 1 loads a PNG file twice, the bitmaps moving for each frame, side to side or circling.
  • Plus Test 2 generates 2 SweepGradient multi-coloured circles moving around.
  • Plus Test 3 draws 200 random small circles in the middle of the screen.
  • Plus Test 4 draws 80 lines from the centre of each side to the opposite side, with changing colours.
  • Plus Test 5 draws the same small random circles as Test 3 but with 4000, filling the screen.

System 2 performance advantage grew from 1.16 to 2.68 times with increasing drawing complexity.

 System 1 Android 10

 Android Java Drawing Benchmark 4A822-Jul-2021 21.58

 Test                            Frames     FPS

 Display PNG Bitmap Twice          597    59.66
 Plus 2 SweepGradient Circles      527    52.63
 Plus 200 Random Small Circles     455    45.45
 Plus 320 Long Lines               329    32.82
 Plus 4000 Random Small Circles    100     9.90

      Screen pixels 720 Wide 1339 High

      Total Elapsed Time   50.2 seconds

System 2 Android 11

 Android Java Drawing Benchmark 4A825-Jul-2021 08.28

 Test                            Frames     FPS

 Display PNG Bitmap Twice          695    69.49
 Plus 2 SweepGradient Circles      599    59.73
 Plus 200 Random Small Circles     772    77.12
 Plus 320 Long Lines               714    71.37
 Plus 4000 Random Small Circles    266    26.52

      Screen pixels 720 Wide 1339 High

      Total Elapsed Time   50.1 seconds

 System 2/1 Comparison

 Display PNG Bitmap Twice                  1.16
 Plus 2 SweepGradient Circles              1.13
 Plus 200 Random Small Circles             1.70
 Plus 320 Long Lines                       2.17
 Plus 4000 Random Small Circles            2.68
  

Java Whetstone Benchmark next or Go To Start


Java Whetstone Benchmark - Java Whetstone.apk

Java performed quite well on both systems, at around half the speed of the optimised compiled C version above. Relative System 2 performance gains on individual tests were also of the same order.

 System 1 Android 10

 Android Java Whetstone Benchmark 4A8 22-Jul-2021 21.57

 Test        MFLOPS    MOPS   millisecs    Results 

 N1 float    360.23             0.053  -1.124750137
 N2 float    348.73             0.385  -1.131330490
 N3 if               980.11     0.106   1.000000000
 N4 fixpt           1415.09     0.223  12.000000000
 N5 cos               83.62     0.995   0.499110132
 N6 float    191.96             2.810   0.999999821
 N7 equal            290.11     0.637   3.000000000
 N8 exp               43.51     0.855   0.935364604

 MWIPS      1649.10             6.064

 Total Elapsed Time   16.1 seconds

 System 2 Android 11

 Android Java Whetstone Benchmark 4A8 25-Jul-2021 08.32

 Test        MFLOPS    MOPS   millisecs    Results 

 N1 float    609.91             0.031  -1.124750137
 N2 float    557.68             0.241  -1.131330490
 N3 if               990.43     0.105   1.000000000
 N4 fixpt           2817.53     0.112  12.000000000
 N5 cos              136.06     0.612   0.499110132
 N6 float    271.06             1.990   0.999999821
 N7 equal            619.30     0.298   3.000000000
 N8 exp               65.67     0.567   0.935364604

 MWIPS      2528.33             3.955

 Total Elapsed Time   13.6 seconds

 System 2/1 Comparison

 N1 float      1.69                     1.000000000
 N2 float      1.60                     1.000000000
 N3 if                 1.01             1.000000000
 N4 fixpt              1.99             1.000000000
 N5 cos                1.63             1.000000000
 N6 float      1.41                     1.000000000
 N7 equal              2.13             1.000000000
 N8 exp                1.51             1.000000000

 MWIPS         1.53

  

Java Linpack Benchmark next or Go To Start


Java Linpack Benchmark - LinpackJava.apk

The Java version carries out double precision floating point calculations. The C results for this version are repeated below, where the sumcheck values show that it was executing identical arithmetic calculations, but twice as fast as Java.

System 2 speed was nearly twice as fast as System 1 via the Java route.

System 1 Android 10

 Android Java Linpack Benchmark 4A8 22-Jul-2021 22.11

 Speed              461.60 MFLOPS

 norm. resid                1.67
 resid            7.41628980e-14
 machep           2.22044605e-16
 x[0]-1          -1.49880108e-14
 x[n-1]-1        -1.89848137e-14


System 2 Android 11

 Android Java Linpack Benchmark 4A8 25-Jul-2021 08.33  System 2/1
                                                       Comparison

 Speed              898.10 MFLOPS                          1.95

 norm. resid                1.67                        1.00000
 resid            7.41628980e-14                        1.00000
 machep           2.22044605e-16                        1.00000
 x[0]-1          -1.49880108e-14                        1.00000
 x[n-1]-1        -1.89848137e-14                        1.00000


 System 2 C Version

 ARM/Intel DP Linpack Benchmark 4A8 24-Jul-2021 21.00    C/Java
           Compiled for 64 bit ARM v8a                 Comparison

 Speed             1985.71 MFLOPS                          2.21

 norm. resid                 1.7                  print rounding differs
 resid            7.41628980e-14                        1.00000
 machep           2.22044605e-16                        1.00000
 x[0]-1          -1.49880108e-14                        1.00000
 x[n-1]-1        -1.89848137e-14                        1.00000
  

DriveSpeed Benchmark next or Go To Start


DriveSpeed Benchmarks - DriveSpd1.apk

DriveSpeed carries out four tests.

Test 1 - Write and read three 8 and 16 MB files; Results given in MBytes/second
Test 2 - Write three 8 MB files, read can be cached in RAM; Results given in MBytes/second
Test 3 - Random write and read 1 KB from 4 to 16 MB; Results are average time in milliseconds
Test 4 - Write and read 200 files 4 KB to 16 KB; Results in MB/sec, msecs/file and delete seconds.

The benchmark has two run buttons RunI to test the internal drive and RunS for an SD card. With RunI, the code to use Direct I/O, avoiding caching, no longer works. However, fully cached results can still be useful. RunS worked originally using a default file path that no longer applies.

A More button is provided to allow uncached reading speed measurements by selecting More/Don’t Delete before RunI to keep the large files, then power off and on followed by More/Read Only (plus Don’t Delete if still required) and RunI.

Below full cached and read only results are provided for the two systems, mainly to demonstrate that the programs worked under these versions of Android. There are indications that System 2 was faster in certain areas, but a number of runs would be required to clarify this, including using the same version of Android.

 System 1 Android 10
 Internal Drive MB   51050 Free   40895

 Android DriveSpeed1 Benchmark 4A8 28-Jul-2021 21.04
           Internal Drive Data Cached
           Compiled for 64 bit ARM v8a

                     MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8     840.5 1049.7 1260.5 2146.2 1950.5 2314.2
  16     757.5  838.3 1040.0 1995.7 2186.2 2094.4
 Cached
   8     793.8  581.6  595.7 1937.8 2184.7 2012.3

 Random      Write                Read
 From MB     4      8     16      4      8     16
 msecs    0.30   0.21   0.22   0.00   0.00   0.00

 200 Files   Write                Read            Delete 
 File KB     4      8     16      4      8     16   secs 
 MB/sec  39.32  80.77 101.42 218.89 345.10 318.04  
 msecs    0.10   0.10   0.16   0.02   0.02   0.05  0.019
 Files Deleted

          Total Elapsed Time   16.4 seconds

 System 2 Android 11
 Internal Drive MB   46183 Free   38203

 Android DriveSpeed1 Benchmark 4A8 28-Jul-2021 20.49
           Internal Drive Data Cached
           Compiled for 64 bit ARM v8a

                     MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8    1622.4 1756.0 1750.5 2215.9 2564.1 2913.6
  16    1683.2 1618.8 1242.2 2260.4 2642.7 2298.9
 Cached
   8     872.6 1482.7 1640.6 2118.8 2918.7 3021.2

 Random      Write                Read
 From MB     4      8     16      4      8     16
 msecs    0.38   0.45   0.46   0.00   0.00   0.00

 200 Files   Write                Read            Delete 
 File KB     4      8     16      4      8     16   secs 
 MB/sec  69.17  99.34  58.97 533.87 762.62 489.21  
 msecs    0.06   0.08   0.28   0.01   0.01   0.03  0.009
 Files Deleted

          Total Elapsed Time   16.3 seconds

 System 1 Read Only

                     MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8       0.0    0.0    0.0  273.7  278.9  253.6

 System 2 Read Only

                      MBytes/Second
  MB    Write1 Write2 Write3  Read1  Read2  Read3
   8       0.0    0.0    0.0  136.7  380.5  450.4


              CPU Stress Tests next or Go To Start
  



CPU Stress Tests - MP-FPU-Stress.apk, MP-Int-Stress.apk, CP_MHz2.apk

USE AT YOUR OWN RISK

There are two main stress test programs that can use multiple threads to exercise (presently) all CPU cores, one using floating point instructions, and the other carryinfg out integer arithmetic. Further detail is covered in the earlier report - Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM and Intel.pdf and with an update in a 2018 publication. The third program monitors MHz of up to 8 cores. Each of the stress test applications has five buttons:

RunB - Run Benchmark - Runs most combinations of number of threads, data sizes and calculations per data word for the FPU tests. This is mainly to help to decide which options to use for stress testing. The benchmark runs using fixed parameters, carrying out exactly the same number of calculations using all thread combinations and data sizes. The pass count changes according to the number of calculations per word, for the FPU tests.

RunS - Run Stress Tests - Default running time is 15 minutes, with the middle data size, intended for containment in L2 cache, using 8 threads. and 32 operations per word in the FPU tests.

False Errors - These can be caused if the run button is tapped again when the tests are running. The main unique symptoms are multiple “End Time” message displays.

SetS - Specify run time parameters for stress test - These are 1, 2, 4, 8, 16 or 32 threads, 2, 8 or 32 Operations per word for FPU tests, 12.8 or 16 KB, 128 or 160 KB, 12.8 or 16 MB for FPU or Integer tests, and running time in minutes.

Info - Test description and details - This is essentially the same as details provided here.

Save - This provides alternative methods to divert the logged output. Currently I select the Google Drive option, allowing me to access the files on my PCs.

Unexpected Faster Speed - Performance depends on whether the data comes from caches or RAM. Then, increasing the number of threads can lead to CPU cores using dedicated smaller and faster caches.

Sumchecks - The programs include sumchecks to show whether the correct arithmetic calculations were produced, as shown for the benchmark results. For integers, each test section uses a different data pattern for all words, checked by the program after manipulation. Floating point numeric results depend on the number of calculations carried out, constant for stress test reported time slots, easily verified manually.

CP_MHz2 measurements are instantaneous at a constant sampling rate, not averages over that time. The program has Set, Run and Save buttons, as above. Default running time is 15 minutes and sampling rate 10 seconds.

Later below are example results of Stress Test Benchmarks, followed by extended Reliability type Tests. Those for stress tests are from logs running default parameters, with 15 minutes running time. Some of the latter include only necessary detail. Examples of full output are as follows.

  ARM/Intel MP-Int Stress Test 4A8 25-Aug-2021 20.04.49
            Compiled for 64 bit ARM v8a

            Data                         Same All
 Seconds    Size Threads  MB/sec Sumcheck Threads

    8.9   160 KB     8     56504 00000000  Yes
   17.7   160 KB     8     55513 00000000  Yes


  ARM/Intel MP-FPU Stress Test 4A8 25-Aug-2021 19.08.22
            Compiled for 64 bit ARM v8a

            Data            Ops/          Nmeric
 Seconds    Size Threads    Word  MFLOPS Results

    8.7   128 KB       8      32   38035   35216
   17.1   128 KB       8      32   37603   35216
  

As seen via the CPU-Z utility app, core MHz values are shown to change at extremely rapid rates. Here, CP_MHz2.apk provides samples at a selected number of seconds rate, as representative and not average. Example output:

  MHz Measurement Test 4A8 25-Aug-2021 19.08.40
  Running time 16 minutes, 30 second samples

                       MHz for Core
  Secs     0     1     2     3     4     5     6     7

  0.00  1478  1478  1709  1478  1478  1190  2035  1402
 30.13  1805  1805  1805  1805  1805  1805  2035  2035
  


Integer Stress Test Benchmark Next or Go To Start


Integer Stress Test Benchmark

Measured performance was similar to earlier tests, such as MP-RandMem Serial Read, but show improved throughput using more than eight threads. Maximum single core Integer MOPS (Million Operations Per Second) would be around 2400 for System 1 and 3800 for System 2, particularly the latter suggesting SIMD activity.

The usual relative performance attributes are show to apply, with System 2 indicated as much faster, with cache based data, using 1 or 2 treads, then possibly slower at 4 and 8.

 System 1 Integer Stress Test Android 10

  ARM/Intel MP-Int Stress Test 4A8 22-Jul-2021 22.20.25
            Compiled for 64 bit ARM v8a

                 MB/second 
               KB    KB    MB            Same All
  Secs Thrds   16   160    16  Sumcheck   Tests

   2.7   1   9709  9290  8314  00000000    Yes
   1.8   2  18282 13642 11112  FFFFFFFF    Yes
   1.3   4  29213 31022 10590  5A5A5A5A    Yes
   1.2   8  42274 37461 10819  AAAAAAAA    Yes
   1.2  16  39014 41492 10944  CCCCCCCC    Yes
   1.0  32  42745 44809 12595  0F0F0F0F    Yes

            End Time 22-Jul-2021 22.20.37

 System 2 Integer Stress Test Android 11

 ARM/Intel MP-Int Stress Test 4A8 25-Jul-2021 15.43.50
            Compiled for 64 bit ARM v8a

                 MB/second 
               KB    KB    MB            Same All
  Secs Thrds   16   160    16  Sumcheck   Tests

   1.7   1  15241 14764 12857  00000000    Yes
   1.2   2  27887 28069 12937  FFFFFFFF    Yes
   1.2   4  27059 32994 13011  5A5A5A5A    Yes
   1.1   8  40754 44941 12292  AAAAAAAA    Yes
   1.0  16  44902 45542 12959  CCCCCCCC    Yes
   0.9  32  45368 49046 14093  0F0F0F0F    Yes

            End Time 25-Jul-2021 15.44.01

 System 2/1 Comparison

               KB    KB    MB         
       Thrds   16   160    16  Sumcheck

         1   1.57  1.59  1.55    SAME
         2   1.53  2.06  1.16    SAME
         4   0.93  1.06  1.23    SAME
         8   0.96  1.20  1.14    SAME
        16   1.15  1.10  1.18    SAME
        32   1.06  1.09  1.12    SAME
  


Floating Point Stress Test Benchmark Next or Go To Start


Floating Point Stress Test Benchmark

This program uses the same C code as MP-MFLOPS, with the addition of tests using 8 floating point calculations per data word read/written. Performance was also similar, including variations with multithreaded activity, apparent in results from multiple runs.

Again, at 12.8 and 128 KB. System 2 was much faster using 1 or 2 threads, but not so at more than 2.

 System 1 FPU Stress Test Android 10

  ARM/Intel MP-FPU Stress Test 4A8 23-Aug-2021 12.36.52
            Compiled for 64 bit ARM v8a

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
  Secs Thrd Word 12.8   128  12.8    12.8    128   12.8

   0.7   T1   2  2997  2873  2289   40392  76406  99700
   0.6   T2   2  6242  4804  2006   40392  76406  99700
   0.5   T4   2  6176  8232  2295   40392  76406  99700
   0.4   T8   2 10243  9381  2326   40392  76406  99700
   1.8   T1   8  4653  4178  3885   54760  85092  99819
   1.0   T2   8  9244  7870  6270   54760  85092  99819
   0.7   T4   8 13161 13388  9711   54760  85092  99819
   0.6   T8   8 19360 18880  9449   54760  85092  99819
   5.0   T1  32  6229  6289  6183   35218  66014  99520
   2.6   T2  32 11883 11629 12316   35218  66014  99520
   1.6   T4  32 19452 17117 24152   35218  66014  99520
   1.3   T8  32 25532 21875 27148   35218  66014  99520

            End Time 23-Aug-2021 12.39.53

 System 2 FPU Stress Test Android 11

 ARM/Intel MP-FPU Stress Test 4A8 25-Jul-2021 15.43.14
            Compiled for 64 bit ARM v8a

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
  Secs Thrd Word 12.8   128  12.8    12.8    128   12.8

   0.4   T1   2  7485  7951  2864   40392  76406  99700
   0.4   T2   2 10662  7114  2458   40392  76406  99700
   0.4   T4   2 11787  8245  2335   40392  76406  99700
   0.4   T8   2 12922 11234  2349   40392  76406  99700
   0.7   T1   8 11687 11698  9744   54760  85092  99819
   0.6   T2   8 18046 15611 10570   54760  85092  99819
   0.6   T4   8 16452 14787 10212   54760  85092  99819
   0.5   T8   8 23830 23385  9278   54760  85092  99819
   2.5   T1  32 12156 12491 12408   35218  66014  99520
   1.3   T2  32 23189 23429 23292   35218  66014  99520
   1.2   T4  32 22673 25410 27226   35218  66014  99520
   1.0   T8  32 28894 35044 29383   35218  66014  99520

            End Time 25-Jul-2021 15.43.31

 System 2/1 Comparison

                       MFLOPS          Numeric Results
            Ops/   KB    KB    MB      KB     KB     MB
       Thrd Word 12.8   128  12.8    12.8    128   12.8

         T1   2  2.50  2.77  1.25  1.0000 1.0000 1.0000
         T2   2  1.71  1.48  1.23  1.0000 1.0000 1.0000
         T4   2  1.91  1.00  1.02  1.0000 1.0000 1.0000
         T8   2  1.26  1.20  1.01  1.0000 1.0000 1.0000
         T1   8  2.51  2.80  2.51  1.0000 1.0000 1.0000
         T2   8  1.95  1.98  1.69  1.0000 1.0000 1.0000
         T4   8  1.25  1.10  1.05  1.0000 1.0000 1.0000
         T8   8  1.23  1.24  0.98  1.0000 1.0000 1.0000
         T1  32  1.95  1.99  2.01  1.0000 1.0000 1.0000
         T2  32  1.95  2.01  1.89  1.0000 1.0000 1.0000
         T4  32  1.17  1.48  1.13  1.0000 1.0000 1.0000
         T8  32  1.13  1.60  1.08  1.0000 1.0000 1.0000

  

Integer Stress Tests Next Page or Go To Start


Integer Stress Tests

Following are results from 15 minute tests at 160 KB and 8 threads. MHz samples were at 30 second intervals, with average measured MB/second over an approximately aligned time slot. These were side by side tests on the two phones at 25°C room temperature.

Phone 1, with the older technology, suffered from around 20% reduction in performance, with thermal throttling identified by these samples, causing about 25% reduction in average core MHz.

Phone 2 appeared to run continuously with all cores at maximum MHz and at effectively constant performance, increasing a 12% advantage to 41% over Phone 1.

 System 1 Android 10
                                    MHz for Core
     Secs MB/sec     0      1      2      3      4      5      6      7 Average

  Start   50536
      30  50280   1989   1989   1989   1989   1989   1989   1989   1989  1989.0
      60  51753   1989   1989   1989   1989   1989   1989   1989   1989  1989.0
      90  48929   1248   1417   1014   1326   1989   1989   1989   1989  1620.1
     120  46032   1989   1989   1989   1989   1846   1846   1846   1846  1917.5
     150  47259   1989   1989   1989   1989   1846   1846   1846   1846  1917.5
     180  44773   1989   1989   1989   1989   1846   1846   1846   1846  1917.5
     210  44937   1014   1989   1131   1989   1417   1417   1417   1417  1473.9
     240  43210   1248   1326   1417   1417   1781   1781   1781   1781  1566.5
     270  45773    910   1326   1326   1417   1989   1989   1989   1989  1616.9
     300  44227   1989   1989   1989   1989   1716   1716   1716   1677  1847.6
     330  43423   1989   1989   1989   1989   1508   1508   1508   1508  1748.5
     360  44751   1989   1989   1989   1989   1508   1508   1508   1508  1748.5
     390  43341   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     420  44706   1989   1989   1989   1989   1508   1417   1417   1417  1714.4
     450  43342   1989   1989   1989   1989   1508   1508   1417   1417  1725.8
     480  43055   1989   1989   1989   1989   1508   1508   1508   1508  1748.5
     510  41329   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     540  41808   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     570  42219   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     600  41529   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     630  42248   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     660  41451   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     690  40210   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     720  40491   1989   1989   1989   1989   1131   1131   1131   1131  1560.0
     750  43947   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     780  43625   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     810  41807   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     840  40617   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     870  40879   1924   1924   1924   1924   1248   1248   1248   1248  1586.0
     900  40190    910   1625   1846   1989   1417   1417   1417   1417  1504.8

 System 2 Android 11

                                    MHz for Core
    Secs MB/sec      0      1      2      3      4      5      6      7 Average

   Start  56504
      30  56784   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
      60  56801   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
      90  56836   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     120  57038   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     150  56999   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     180  56313   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     210  56803   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     240  51659   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     270  56591   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     300  55605   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     330  56918   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     360  56549   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     390  57166   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     420  56985   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     450  57127   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     480  56321   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     510  52377   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     540  56553   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     570  56935   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     600  56567   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     630  56971   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     660  56653   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     690  56682   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     720  56555   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     750  56010   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     780  56752   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     810  56862   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     840  56901   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     870  56852   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     900  56828   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
  

Floating Point Stress Tests Next Page or Go To Start


Floating Point Stress Tests

These were also run for 15 minutes using 8 threads, but with 128 KB data. The testing arrangements were as used for the integer exercise. Performance is measured in MFLOPS.

Phone 1 suffered from 17% to 18% reduction in measured MFLOPS and core MHz. Again, Phone 2 appeared to run continuously at maximum speed, with performance gains, over Phone 1, of 21% at the start, increasing to 46% after 15 minutes.

 System 1 Android 10

                                    MHz for Core
    Secs MFLOPS      0      1      2      3      4      5      6      7 Average

   Start  31309
      30  27427   1989   1989   1989   1989   1989   1989   1989   1989  1989.0
      60  30330   1989   1989   1989   1989   1846   1846   1846   1846  1917.5
      90  27311   1989   1989   1989   1989   1846   1846   1846   1846  1917.5
     120  26744   1989   1989   1989   1989   1989   1989   1989   1989  1989.0
     150  27714   1989   1989   1989   1989   1716   1716   1716   1716  1852.5
     180  26317   1989   1989   1989   1989   1248   1248   1248   1248  1618.5
     210  26750   1989   1989   1989   1989   1625   1625   1625   1625  1807.0
     240  27494   1989   1989   1989   1989   1625   1625   1625   1625  1807.0
     270  26435   1989   1989   1989   1989   1508   1508   1508   1508  1748.5
     300  24936   1989   1989   1989   1989   1508   1508   1508   1508  1748.5
     330  26723   1989   1989   1989   1989   1508   1508   1508   1508  1748.5
     360  26770   1989   1989   1989   1989   1508   1508   1508   1508  1748.5
     390  26950   1989   1677   1014   1989   1508   1508   1508   1508  1587.6
     420  26661   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     450  26232   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     480  26988   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     510  25936   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     540  25953   1248   1417   1248   1248   1417   1417   1417   1417  1353.6
     570  25431   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     600  26234   1248   1131   1014   1326   1989   1989    793   1989  1434.9
     630  26008   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     660  26146   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     690  26144   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     720  25600   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     750  25470   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     780  25466   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     810  24963   1989   1989   1989   1989   1326   1326   1326   1326  1657.5
     840  25516   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     870  25335   1989   1989   1989   1989   1417   1417   1417   1417  1703.0
     900  25738   1989   1989   1989   1989   1326   1326   1326   1326  1657.5

 System 2 Android 11

                                    MHz for Core
    Secs MFLOPS      0      1      2      3      4      5      6      7 Average

   Start  38035
      30  37416   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
      60  37635   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
      90  37849   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     120  37581   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     150  37826   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     180  37793   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     210  37668   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     240  37791   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     270  37894   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     300  37456   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     330  37587   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     360  37568   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     390  37568   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     420  37709   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     450  37619   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     480  37452   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     510  37762   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     540  37935   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     570  37803   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     600  37684   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     630  37890   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     660  37818   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     690  37874   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     720  37569   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     750  37604   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     780  37764   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     810  37675   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     840  37678   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     870  37525   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
     900  37652   1805   1805   1805   1805   1805   1805   2035   2035  1862.5
  

More Integer Stress Tests Next Page or Go To Start


More Integer Stress Tests

Following are first and last results of 15 minute stress tests using 2, 4 and 32 threads. System 2, with 2/4 fast/slow big.LITTLE cores, demonstrated its most impressive performance gains using two threads, with both running continuously at maximum MHz speed. With four threads, System 1 appeared to be slightly faster to start with, but lost the lead over time, with MHz throttling. The 32 thread tests were using RAM based data, with both running at maximum speed and System 2 indicting a 22% improvement.

            Data                      Same All    System
 Seconds    Size Threads  MB/sec Sumcheck Threads   2/1

 System 1 Android 10 - 2 Threads

   10.0   160 KB     2     18837 00000000  Yes
   19.7   160 KB     2     18613 00000000  Yes

  892.3   160 KB     2     18815 AAAAAAAA  Yes
  901.9   160 KB     2     18821 AAAAAAAA  Yes

 System 2 Android 11 - 2 Threads

    9.9   160 KB     2     30768 00000000  Yes
   19.5   160 KB     2     30747 00000000  Yes     1.65

  893.6   160 KB     2     30763 AAAAAAAA  Yes
  903.2   160 KB     2     30758 AAAAAAAA  Yes     1.63


 System 1 Android 10 - 4 Threads

    9.2   160 KB     4     38542 00000000  Yes
   18.1   160 KB     4     38418 00000000  Yes

  894.0   160 KB     4     31946 AAAAAAAA  Yes
  904.6   160 KB     4     32553 AAAAAAAA  Yes

 System 2 Android 11 - 4 Threads

    9.5   160 KB     4     36361 00000000  Yes
   18.7   160 KB     4     36326 00000000  Yes     0.95

  891.4   160 KB     4     36342 AAAAAAAA  Yes
  900.6   160 KB     4     36329 CCCCCCCC  Yes     1.12


 System 1 Android 10 - 32 Threads

   11.0    16 MB    32     12723 00000000  Yes
   21.0    16 MB    32     13004 00000000  Yes

  896.7    16 MB    32     12233 5A5A5A5A  Yes
  907.4    16 MB    32     12305 5A5A5A5A  Yes

 System 2 Android 11 - 32 Threads

   10.7    16 MB    32     15301 00000000  Yes
   20.5    16 MB    32     15851 00000000  Yes     1.22

  890.3    16 MB    32     15083 5A5A5A5A  Yes
  900.7    16 MB    32     14971 5A5A5A5A  Yes     1.22

  
More Floating Point Stress Tests Next Page or Go To Start


More Floating Point Stress Tests

These were run using the same profile as the integer stress tests, but with more significant advantages by System 2. The latter also continued to run at constant core MHz speeds, but System 1 suffered from MHz throttling in all cases.


            Data            Ops/          Nmeric  System
 Seconds    Size Threads    Word  MFLOPS Results    2/1

 System 1 Android 10 - 2 Threads

    8.0   128 KB       2       2    6918   40015
   15.8   128 KB       2       2    6942   40015

  893.4   128 KB       2       2    6021   40015
  901.9   128 KB       2       2    6378   40015


 System 2 Android 11 - 2 Threads

    5.7   128 KB       2       2   17933   40015
   11.4   128 KB       2       2   17939   40015   2.58

  899.8   128 KB       2       2   17920   40015
  905.7   128 KB       2       2   17203   40015   2.70


 System 1 Android 10 - 4 Threads

    9.4   128 KB       4      32   25153   35216
   18.5   128 KB       4      32   24895   35216

  896.5   128 KB       4      32   20789   35216
  907.6   128 KB       4      32   20266   35216

 System 2 Android 11 - 4 Threads

   10.0   128 KB       4      32   27627   35216   
   19.7   128 KB       4      32   27632   35216   1.11

  899.9   128 KB       4      32   27684   35216
  911.1   128 KB       4      32   23746   35216   1.17


 System 1 Android 10 - 32 Threads

    8.8  12.8 MB      32      32   32893   88227
   17.9  12.8 MB      32      32   30370   88227

  891.0  12.8 MB      32      32   20635   88227
  903.9  12.8 MB      32      32   21551   88227


 System 2 Android 11 - 32 Threads

    8.8  12.8 MB      32      32   37276   86674
   17.4  12.8 MB      32      32   37221   86674   1.23

  894.1  12.8 MB      32      32   37619   86674
  902.6  12.8 MB      32      32   37366   86674   1.73
  
Go To Start